Silhouette Score — Clustering Validation (From Scratch)#
The silhouette score is an internal clustering metric: it evaluates cluster quality using only the data and the assigned cluster labels (no ground-truth classes needed).
It compares, for each sample, how close it is to points in its own cluster (cohesion) versus how close it is to points in the nearest other cluster (separation).
What you’ll learn#
The exact definition of \(a(i)\), \(b(i)\), and the silhouette coefficient \(s(i)\)
A NumPy implementation of
silhouette_samples/silhouette_scoreHow to read a silhouette plot (per-sample diagnostics)
How to use the silhouette score to select \(k\) for a simple K-means implementation
Pros/cons and common pitfalls (distance metrics, scaling, cluster shape)
Quick import (scikit-learn)#
from sklearn.metrics import silhouette_score, silhouette_samples
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score as skl_silhouette_score
from sklearn.metrics import silhouette_samples as skl_silhouette_samples
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=3, suppress=True)
rng = np.random.default_rng(7)
1) Intuition: “am I closer to my own cluster?”#
For a point \(x_i\):
if it’s much closer to points in its own cluster than to points in other clusters → silhouette near 1
if it lies on the border between clusters → silhouette near 0
if it’s actually closer to another cluster than to its assigned one → silhouette becomes negative
This makes the silhouette score useful not only as a single number, but also as a diagnostic: you can find which points are likely misclustered.
2) Definition (math + notation)#
Let:
\(X = \{x_i\}_{i=1}^n\) with \(x_i \in \mathbb{R}^d\)
cluster labels \(y_i \in \{0, 1, \dots, k-1\}\)
a distance function \(d(x_i, x_j)\) (often Euclidean)
Define the index set of cluster \(c\):
For a given sample \(i\) with assigned cluster \(c = y_i\):
2.1 Intra-cluster distance \(a(i)\) (cohesion)#
If \(|C_c| = 1\) (a singleton cluster), \(a(i)\) is undefined; in practice we set the sample’s silhouette to 0.
2.2 Nearest-cluster distance \(b(i)\) (separation)#
Mean distance from \(i\) to another cluster \(c' \neq c\):
Then:
2.3 Silhouette coefficient#
\(s(i) \in [-1, 1]\)
larger is better
2.4 Mean silhouette score#
The score is only defined when \(2 \le k \le n-1\) (you need at least 2 clusters, and not every point in its own cluster).
3) From-scratch NumPy implementation#
A straightforward implementation computes the full pairwise distance matrix \(D \in \mathbb{R}^{n\times n}\) where \(D_{ij} = d(x_i, x_j)\).
Time: \(\mathcal{O}(n^2 d)\) to build distances + additional work per cluster
Memory: \(\mathcal{O}(n^2)\) for the distance matrix
For very large \(n\), you typically need approximations (subsampling, nearest-neighbor graphs, or chunked computations).
def standardize(X, eps=1e-12):
X = np.asarray(X, dtype=float)
mean = X.mean(axis=0, keepdims=True)
std = X.std(axis=0, ddof=0, keepdims=True)
return (X - mean) / (std + eps), mean, std
def pairwise_euclidean_distances(X):
"""Compute D[i, j] = ||x_i - x_j||_2 for all pairs.
Uses: ||a-b||^2 = ||a||^2 + ||b||^2 - 2 a·b
"""
X = np.asarray(X, dtype=float)
sq_norms = np.sum(X * X, axis=1, keepdims=True) # (n, 1)
D2 = sq_norms + sq_norms.T - 2.0 * (X @ X.T)
D2 = np.maximum(D2, 0.0) # numerical guard
return np.sqrt(D2)
def silhouette_samples_from_distance_matrix(D, labels):
"""Silhouette coefficient per sample from a full distance matrix.
Args:
D: (n, n) symmetric distance matrix with zeros on diagonal
labels: (n,) cluster labels
Returns:
s: (n,) silhouette per sample
"""
labels = np.asarray(labels)
n = labels.shape[0]
D = np.asarray(D, dtype=float)
if D.shape != (n, n):
raise ValueError(f"D must have shape (n,n) = {(n, n)}, got {D.shape}")
unique_labels, inv = np.unique(labels, return_inverse=True)
k = unique_labels.size
if k < 2 or k >= n:
raise ValueError(
f"silhouette is defined only for 2 <= n_clusters <= n_samples-1; got n_clusters={k}, n_samples={n}"
)
cluster_indices = [np.where(inv == j)[0] for j in range(k)]
# a(i): mean intra-cluster distance (exclude self)
a = np.zeros(n, dtype=float)
singleton_mask = np.zeros(n, dtype=bool)
for j, idx in enumerate(cluster_indices):
m = idx.size
if m <= 1:
singleton_mask[idx] = True
continue
a[idx] = D[np.ix_(idx, idx)].sum(axis=1) / (m - 1)
# mean_dist[i, j] = mean_{p in cluster j} D[i, p]
mean_dist = np.empty((n, k), dtype=float)
for j, idx in enumerate(cluster_indices):
mean_dist[:, j] = D[:, idx].mean(axis=1)
# b(i): distance to nearest other cluster
mean_dist[np.arange(n), inv] = np.inf
b = mean_dist.min(axis=1)
denom = np.maximum(a, b)
s = np.zeros(n, dtype=float)
nonzero = denom > 0
s[nonzero] = (b[nonzero] - a[nonzero]) / denom[nonzero]
# sklearn convention: silhouette = 0 for singleton clusters
s[singleton_mask] = 0.0
return s
def silhouette_score_from_distance_matrix(D, labels):
return float(silhouette_samples_from_distance_matrix(D, labels).mean())
def silhouette_score_numpy(X, labels):
D = pairwise_euclidean_distances(X)
return silhouette_score_from_distance_matrix(D, labels)
def silhouette_point_details(D, labels, i):
"""Return a(i), b(i), and mean distances to each cluster for sample i."""
labels = np.asarray(labels)
unique_labels, inv = np.unique(labels, return_inverse=True)
cluster_indices = [np.where(inv == j)[0] for j in range(unique_labels.size)]
i_cluster = inv[i]
idx_self = cluster_indices[i_cluster]
mean_to_clusters = np.array([D[i, idx].mean() for idx in cluster_indices], dtype=float)
if idx_self.size <= 1:
a = 0.0
else:
# includes self-distance 0 in the sum; dividing by (m-1) excludes it effectively
a = float(D[i, idx_self].sum() / (idx_self.size - 1))
tmp = mean_to_clusters.copy()
tmp[i_cluster] = np.inf
b_cluster = int(np.argmin(tmp))
b = float(tmp[b_cluster])
return {
"a": a,
"b": b,
"mean_to_clusters": mean_to_clusters,
"cluster_labels": unique_labels,
"i_cluster_label": unique_labels[i_cluster],
"b_cluster_label": unique_labels[b_cluster],
}
4) A tiny example (including a negative silhouette)#
We’ll create two compact clusters in 2D, then intentionally mislabel one point. That point should end up with a negative silhouette.
# Two tight clusters + one misassigned point
X_toy = np.array(
[
[0.0, 0.0],
[0.0, 1.0],
[1.0, 0.0],
[1.0, 1.0],
[5.0, 0.0],
[5.0, 1.0],
[6.0, 0.0],
[6.0, 1.0],
[5.0, 0.5], # near cluster 1...
],
dtype=float,
)
labels_toy = np.array([0, 0, 0, 0, 1, 1, 1, 1, 0], dtype=int) # ...but labeled as cluster 0
D_toy = pairwise_euclidean_distances(X_toy)
s_toy = silhouette_samples_from_distance_matrix(D_toy, labels_toy)
score_toy = float(s_toy.mean())
# sanity check vs scikit-learn
skl_s_toy = skl_silhouette_samples(X_toy, labels_toy)
skl_score_toy = float(skl_silhouette_score(X_toy, labels_toy))
print("from scratch mean silhouette:", score_toy)
print("sklearn mean silhouette :", skl_score_toy)
print("max |per-sample diff| :", float(np.max(np.abs(s_toy - skl_s_toy))))
i = len(X_toy) - 1
details = silhouette_point_details(D_toy, labels_toy, i)
print("\nselected point index:", i)
print("assigned cluster:", int(labels_toy[i]))
print(f"a(i) (own cluster mean dist): {details['a']:.3f}")
print(f"b(i) (nearest other cluster): {details['b']:.3f}")
print(f"s(i) : {s_toy[i]:.3f}")
# scatter plot
fig = px.scatter(
x=X_toy[:, 0],
y=X_toy[:, 1],
color=labels_toy.astype(str),
title="Toy clustering (one point is misassigned)",
labels={"color": "cluster"},
width=750,
height=450,
)
fig.add_trace(
go.Scatter(
x=[X_toy[i, 0]],
y=[X_toy[i, 1]],
mode="markers",
name="selected point",
marker=dict(symbol="star", size=18, color="black", line=dict(width=1, color="white")),
)
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
# mean distance from selected point to each cluster
cluster_labels = details["cluster_labels"]
mean_to = details["mean_to_clusters"]
x_names = [f"cluster {c}" for c in cluster_labels]
fig = go.Figure()
fig.add_trace(go.Bar(x=x_names, y=mean_to, name="mean distance"))
fig.add_trace(
go.Scatter(
x=x_names,
y=[details["a"]] * len(x_names),
mode="lines",
name="a(i)",
line=dict(color="royalblue", dash="dash"),
)
)
fig.add_trace(
go.Scatter(
x=x_names,
y=[details["b"]] * len(x_names),
mode="lines",
name="b(i)",
line=dict(color="firebrick", dash="dash"),
)
)
fig.update_layout(
title="Mean distance from the selected point to each cluster",
yaxis_title="mean distance",
width=750,
height=420,
)
fig.show()
from scratch mean silhouette: 0.5004736300915591
sklearn mean silhouette : 0.5004736300915591
max |per-sample diff| : 0.0
selected point index: 8
assigned cluster: 0
a(i) (own cluster mean dist): 4.528
b(i) (nearest other cluster): 0.809
s(i) : -0.821
5) The silhouette plot (per-sample diagnostics)#
A silhouette plot shows all \(s(i)\) values, grouped by cluster and sorted within each cluster. This helps you see:
whether some clusters contain many borderline points (\(s(i) \approx 0\))
whether a cluster has many negatives (likely misclustered)
whether cluster sizes are extremely imbalanced
def plot_silhouette(s, labels, title="Silhouette plot"):
s = np.asarray(s, dtype=float)
labels = np.asarray(labels)
unique = np.unique(labels)
colors = px.colors.qualitative.Set2
fig = go.Figure()
y_offset = 0
for j, lab in enumerate(unique):
vals = np.sort(s[labels == lab])
y = np.arange(y_offset, y_offset + vals.size)
fig.add_trace(
go.Bar(
x=vals,
y=y,
orientation="h",
marker_color=colors[j % len(colors)],
name=f"cluster {lab}",
hovertemplate="s=%{x:.3f}<extra></extra>",
)
)
y_offset += vals.size + 6
mean_s = float(np.mean(s))
fig.add_trace(
go.Scatter(
x=[mean_s, mean_s],
y=[-5, y_offset],
mode="lines",
line=dict(color="black", dash="dash"),
showlegend=False,
hoverinfo="skip",
)
)
fig.add_annotation(x=mean_s, y=y_offset, text=f"mean={mean_s:.3f}", showarrow=False, yshift=10)
fig.update_layout(
title=title,
xaxis_title="silhouette value",
yaxis_title="samples (grouped by cluster)",
xaxis=dict(range=[-1, 1]),
width=900,
height=450,
)
fig.update_yaxes(showticklabels=False)
return fig
fig = plot_silhouette(s_toy, labels_toy, title="Toy silhouette plot")
fig.show()
6) Using silhouette to select \(k\) in K-means (from scratch)#
The silhouette score is not a smooth, differentiable loss in terms of cluster centroids (assignments change discretely). So you typically don’t optimize it with gradient descent.
Instead, it’s commonly used for model selection:
choose a hyperparameter like the number of clusters \(k\)
compare different clustering algorithms / distance metrics
pick the best run among multiple random initializations
Below is a minimal Lloyd-style K-means implementation + a small grid search over \(k\) where we pick the best \(k\) by maximizing the mean silhouette score.
def kmeans_lloyd(X, k, rng, max_iter=100, tol=1e-6):
"""Very small K-means (Lloyd's algorithm) implementation."""
X = np.asarray(X, dtype=float)
n, d = X.shape
if not (1 <= k <= n):
raise ValueError("k must be in [1, n_samples]")
centroids = X[rng.choice(n, size=k, replace=False)]
for _ in range(max_iter):
# assignment step
D2 = np.sum((X[:, None, :] - centroids[None, :, :]) ** 2, axis=2) # (n, k)
labels = np.argmin(D2, axis=1)
# update step
new_centroids = centroids.copy()
for j in range(k):
mask = labels == j
if not np.any(mask):
# handle empty cluster by reinitializing to a random point
new_centroids[j] = X[rng.integers(n)]
else:
new_centroids[j] = X[mask].mean(axis=0)
shift = np.linalg.norm(new_centroids - centroids)
centroids = new_centroids
if shift <= tol:
break
# final assignment with the final centroids
D2 = np.sum((X[:, None, :] - centroids[None, :, :]) ** 2, axis=2)
labels = np.argmin(D2, axis=1)
return labels, centroids
def best_kmeans_by_silhouette(X, D, k, n_init=10, seed=0, max_iter=100):
rng_local = np.random.default_rng(seed)
best_score = -np.inf
best_labels = None
best_centroids = None
for _ in range(n_init):
labels, centroids = kmeans_lloyd(X, k, rng_local, max_iter=max_iter)
try:
score = silhouette_score_from_distance_matrix(D, labels)
except ValueError:
# can happen if a run collapses into 1 effective cluster
continue
if score > best_score:
best_score = score
best_labels = labels
best_centroids = centroids
if best_labels is None:
raise RuntimeError("All initializations produced an invalid clustering")
return float(best_score), best_labels, best_centroids
# Synthetic 2D dataset (3 Gaussian blobs with different spreads + a few outliers)
n_per = 150
centers = np.array([[-2.5, -0.5], [0.0, 2.8], [2.7, -0.2]])
scales = np.array([0.35, 0.55, 0.75])
X_raw = np.vstack(
[rng.normal(loc=centers[j], scale=scales[j], size=(n_per, 2)) for j in range(centers.shape[0])]
)
outliers = rng.uniform(low=[-4.0, -3.0], high=[4.0, 4.0], size=(25, 2))
X_raw = np.vstack([X_raw, outliers])
fig = px.scatter(
x=X_raw[:, 0],
y=X_raw[:, 1],
title="Synthetic dataset (unlabeled)",
width=750,
height=450,
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
# IMPORTANT: silhouette uses distances, so feature scaling matters.
X, _, _ = standardize(X_raw)
D = pairwise_euclidean_distances(X)
ks = list(range(2, 9))
scores = []
models = {}
for k in ks:
score, labels, centroids = best_kmeans_by_silhouette(X, D, k, n_init=8, seed=100 + k, max_iter=80)
scores.append(score)
models[k] = (labels, centroids)
k_best = int(ks[int(np.argmax(scores))])
labels_best, centroids_best = models[k_best]
print("best k:", k_best)
print("best mean silhouette:", float(np.max(scores)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=ks, y=scores, mode="lines+markers", name="mean silhouette"))
fig.add_trace(
go.Scatter(
x=[k_best],
y=[float(np.max(scores))],
mode="markers",
marker=dict(size=12, color="black"),
name="chosen k",
)
)
fig.update_layout(
title="Silhouette score vs number of clusters (K-means from scratch)",
xaxis_title="k",
yaxis_title="mean silhouette",
width=800,
height=420,
)
fig.show()
fig = px.scatter(
x=X[:, 0],
y=X[:, 1],
color=labels_best.astype(str),
title=f"K-means from scratch (k={k_best})",
labels={"color": "cluster"},
width=750,
height=450,
)
fig.add_trace(
go.Scatter(
x=centroids_best[:, 0],
y=centroids_best[:, 1],
mode="markers",
name="centroids",
marker=dict(symbol="x", size=12, color="black"),
)
)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
best k: 3
best mean silhouette: 0.7382865700820038
s_best = silhouette_samples_from_distance_matrix(D, labels_best)
fig = plot_silhouette(s_best, labels_best, title=f"Silhouette plot (k={k_best})")
fig.show()
7) Practical usage with scikit-learn#
In practice you’ll often compute silhouette for a fitted clustering model:
labels = model.fit_predict(X)
score = silhouette_score(X, labels)
Below we compute silhouette for scikit-learn’s KMeans and verify that our NumPy implementation matches sklearn.metrics.silhouette_score for the same labels.
kmeans = KMeans(n_clusters=k_best, n_init=10, random_state=7)
labels_skl = kmeans.fit_predict(X)
score_skl = float(skl_silhouette_score(X, labels_skl))
score_np = silhouette_score_numpy(X, labels_skl)
print("sklearn KMeans silhouette:", score_skl)
print("our silhouette (same labels):", score_np)
sklearn KMeans silhouette: 0.7380559615275238
our silhouette (same labels): 0.7380559615275238
Pros / Cons / Pitfalls#
Pros#
No ground truth needed (internal validation)
Interpretable scale in \([-1, 1]\) and available per-sample (easy to diagnose bad assignments)
Works with any distance metric (Euclidean, cosine, etc.) if it matches your notion of similarity
Cons / limitations#
Expensive for large datasets: exact computation needs all pairwise distances (\(\mathcal{O}(n^2)\))
Assumes distances are meaningful; in high dimensions distances can concentrate
Often favors compact / convex clusters; can be misleading for non-globular structure (e.g., “two moons”)
Sensitive to feature scaling, outliers, and clusters with very different sizes/densities
Not a training loss: it’s generally non-differentiable, so it’s used for model selection, not gradient descent
Practical diagnostics#
Don’t rely on the mean alone: inspect the silhouette plot.
Many negative values → points are likely assigned to the wrong cluster or \(k\) is wrong.
Singleton clusters get silhouette 0 by convention; if they appear, treat it as a warning sign.
Exercises#
Implement silhouette using Manhattan distance and compare results.
Create a dataset with non-convex clusters (two moons) and see how silhouette behaves with K-means.
References#
P. J. Rousseeuw (1987), Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.
scikit-learn docs: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html